Message Understanding Conference (MUC) Tests of Discourse Processing
نویسندگان
چکیده
Performance evaluations of NLP systems have been designed and conducted that require systems to extract certain prespecified information about events and entities. A single text may describe multiple events and entities, and the evaluation task requires the system to resolve references to produce the expected output.We describe an early attempt to use the results from an information extraction evaluation to provide insight notice relationship between the difficulty of discourse processing and performance on the information extraction task. We then discuss an upcoming noun phrase coreference evaluation that has been designed independently of any other evaluation task in order to establish a clear performance benchmark on a small set of discourse phenomena. Background on the MUC Evaluations Five Message Understanding Conferences have been held since 1987 (Sundheim and Chinchor 1993) and a sixth one is planned for 1995 (Grishman 1994). Each conference serves as a forum for reporting on a multisite evaluation of text understanding systems. Out of the experiences of the community of evaluators and evaluation participants has grown a basic paradigm of blackbox testing based on an information extraction task. The basic paradigm consists of a task in which the systems under test are to process a designated set of texts to fill slots in a template database according to prespecified rules. The domain of the test and the prespecified rules are developed with the interests of the research community (technical challenge), the evaluators (evaluability), and potential customers (utility) in mind. For example, the domain of MUC-3 and MUC-4 (Chinchor, Hirschman, and Lewis 1993) was terrorist activity in nine Latin American countries. The systems had to analyze news articles and determine whether a reported event was a terrorist event and, if so, who had done what to whom. The systems then put this information into a template containing slots such as event type, perpetrator, target, and effect. A study of discourse-related aspects of the MUC-4 information extraction task (Hirschman 1992) is summarized in this paper. The MUC-6 evaluation, which is scheduled to be conducted in the fall of 1995, will include a modified version of an information extraction task, and it will also include two text-tagging tasks. One of the text-tagging tasks is to identify some types of coreference relations. The design of this task is described in a later section of this paper. Testing Event Tracking in an Information Extraction Context Representatives of eight MUC-3 sites authored a joint paper on discourse processing in the MUC-3 systems (Iwanska et al 1991). The paper describes the capabilities of the MUC-3 systems in the following three areas: 1. Identifying portions of text that describe different domain events (recognizing single event vs. multiple events) 2. Resolving references: a. pronoun references b. proper name references c. defnite references 3. Discourse representation The group concluded that the tasks of recognizing a single event and distinguishing multiple events were the most important aspects of discourse-related processing for the information extraction task. While most systems did do reference resolution, they had various ways of doing it. Most systems did not produce an explicit discourse representation. All but one author believed that handling discourserelated phenomena was the area which would yield the most improvement in performance. Extensions to the following processes and data were believed to he means of recognizing a single event and distinguishing multiple events: reference resolution (particularly, distributed definite anaphora and vague references), temporal and spatial reasoning, ambiguity resolution, semantic criteria for merging events, and general world knowledge. To explore further the effects of discourse on perforFrom: AAAI Technical Report SS-95-06. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
ITP: description of the Interpretext system as used for MUC-3
The Message Handler identifies an individual MUC message unit and controls the flow o f operations of the whole system for one message . It reads sentences, and sends one sentence at a time to the ITP Natural Language Understanding Module . It prunes temporal and locative expressions, which are added back later to modify events . The ITP NLU Module parses one sentence, and maps its parse tree o...
متن کاملEvaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3)
This paper describes and analyzes the results of the Third Message Understanding Conference (MUC-3). It reviews the purpose, history, and methodology of the conference, summarizes the participating systems, discusses issues of measuring system effectiveness, describes the linguistic phenomena tests, and provides a critical look at the evaluation in terms of the lessons learned. One of the commo...
متن کاملOverview of the fourth message understanding evaluation and conference
The Fourth Message Understanding Conference (MUC-4) is the latest in a serie s of conferences that concern the evaluation of natural language processing (NLP ) systems. These conferences have reported on progress being made both in th e development of systems capable of analyzing relatively short English texts and in the definition of a rigorous performance evaluation methodology . MUC-4 was pr...
متن کاملAn adjunct test for discourse processing in MUC-4
The motivation for this adjunct test came from an exploratory s tudy done by Beth Sundheim during MUC-3. This s tudy showed a degradation in correctness of message processing as the information distribution in the message became more complex, that is, as slot fills were drawn from larger portions of the message and required more discourse processing to extract the information and reassemble it ...
متن کاملTIPSTER Program History
The third thread is the sponsorship of the international Message Understanding Conferences (MUC's) and Text Retrieval Conferences (TREC's). These conferences, which evaluated the state of the art and promoted text-processing R&D outside of the TIPSTER Text contracts, were organized by NRaD and NIST. MUC-1 and MUC-2 preceded and set the stage for TIPSTER, before the sponsorship of these conferen...
متن کامل